• Blog
  • The Final Day: where we go from here...
10/26/2021
The Final Day: where we go from here...
Visual Processing Situational Knowledge Cognitive Systems Representations Art Society Technology Robots

The Final Day of ICA4 began with two thought-provoking lectures from **Shimon Ullman **and Zaven Paré, discussing how humans extract information from complex scenes and the peculiar nature of the relationship between humans and robots. Pare then revealed a provocative example of an existing robot that is addressed to isolation in a contemporary urban environment.

The Fellows then took a break to explore the 3D virtual campus of the Paris IAS, called Teemew Campus Alpha, in which lounges, public seminar rooms, private meeting rooms and workshops are accessible to ICA4 Fellows and Mentors. Each Academia member possesses an avatar, created from his/her own image, which is then used to navigate through the IAS cyber campus.

The Fellows then discussed their follow-up tasks and next step(s). The first session of ICA4 in Paris was then wrapped up with a presentation from, this time, the Fellows! They highlighted some questions to explore further, ideas for research papers, an interdisciplinary manifesto for Artificial Intelligence and finally, they reflected on the key takeaways from the entire event.

The intellectually intense series of events was finally concluded with some cocktails at the Paris IAS, marking the very beginning of a series of scientific adventures which are yet to come, as the ICA4 Fellows continue to collectively explore some seemingly never-ending questions through combining various perspectives on Intelligence and Artificial Intelligence, ultimately discovering and shaping how such complex matters should, and will be, embedded within societies...

The top-down and bottom-up in visual processing

Presented by Shimon Ullman

A major question in visual processing is how humans extract information from complex scenes. Images often tell us a story! Extracting such a narrative from a complex scene is a sophisticated task. There is a great deal of cultural and situational knowledge that serves as the background for directing attentional and visual processing resources.

Deep neural networks have made substantial progress in certain types of visual processing tasks. But there are massive data requirements. Merely identifying objects and characteristics is not enough to account for relationships between objects. Humans are very fast at identifying the important features in an image that relate to structure and narrative. There is an interesting set of psychophysics studies that examine how people extract information based on the time of exposure to an image. What we learn from these studies is that there is a substantial amount of cross-talk between visual and cognitive systems. It’s not the case that the visual system just analyses a scene and then sends the processed information upstream. Rather some visual information goes up to the cognitive centres, and that information is used to direct queries carried out by the visual system, which sends information up to the cognitive centres in an iterative process.

Can we model this process partially in an artificial system as an “unfolded RNN” that involves both bottom-up and top-down layers? In such a system, there are both symbolic and embedded representations. The inputs for the models are the images themselves and a set of instructions. The instructions are queries structured as vectors coded over objects and properties. The algorithm returns a correct answer when it pulls the queried information from the image. We can interpret what’s happening here symbolically as a program or sequential set of instructions for extracting information from an image.

A major challenge is a combinatorial generalisation. The same structure can be instantiated in many ways (compare “A brown dog chasing a terrified kitten” with “A large cat chasing a furry squirrel”). One way to test this is to leave out object/property pairs from the training set and test these. The bottom-up/top-down systems do well with the left-out data, but the bottom-up only system performs poorly on the left-out data.

There is a broader set of questions on modelling and understanding. Humans have a very high-level understanding of a concept like “drinking” that is very hard to imagine arising from a purely bottom-up model. More data, even massive amounts of data, might not be sufficient without some higher-order structure.

Upon completion of the presentation, ICA4 Fellows then asked questions related to how images are embedded in actions and within cultural contexts, the relationship between ontologies and bottom-up networks, and visual processing in non-humans and how that informs our thinking on the role of abstract reasoning in visual processing.

The role of the robot in society

Presented by Zaven Paré

Art can tell us something about society and technology. Once robots are embedded in contexts, they take on new characteristics – even if they are manufactured to be the “same,” they are changed by their environments. Different robots serve different functions in different societies. An important potential role for robots is to respond to isolation. Robots in deep sea and space are examples of how these systems can generate human-artificial interaction in conditions of isolation. The Gatebox product is a provocative example of an existing robot that is addressed to isolation in a contemporary urban environment.

Scribe: Mike Livermore

Chair: Alex Cayco Gajic

The fellows then discussed questions related to the importance of isolation, what it means for a robot to offer companionship and the social meaning of different anthropic forms being projected onto robots...

By Atrina Oraee